Model Selection

INT8 Quantized Inference

# INT8 Quantized Inference

Gte Multilingual Reranker Base Onnx Op14 Opt Gpu Int8

This is the quantized ONNX version of Alibaba-NLP/gte-multilingual-reranker-base, utilizing INT8 quantization, optimized for GPU, and suitable for text classification tasks.

Text Embedding Other

Qwen2.5 VL 3B Instruct Quantized.w8a8

Quantized version of Qwen/Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, with weights quantized to INT8 and activations quantized to INT8.

Transformers English

Deepseek R1 Distill Qwen 32B Quantized.w8a8

Quantized version of DeepSeek-R1-Distill-Qwen-32B, reducing memory requirements and improving computational efficiency through INT8 weight quantization and activation quantization

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase